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Abstract 

Cell population heterogeneity can affect cellular response and is a major factor in drug resistance. However, there are few 
techniques available to represent and explore how heterogeneity is linked to population response. Recent high-throughput 
genomic, proteomic, and cellomic approaches offer opportunities for profiling heterogeneity on several scales. We have 
recently examined heterogeneity in vascular endothelial growth factor receptor (VEGFR) membrane localization in 
endothelial cells. We and others processed the heterogeneous data through ensemble averaging and integrated the data 
into computational models of anti-angiogenic drug effects in breast cancer. Here we show that additional modeling insight 
can be gained when cellular heterogeneity is considered. We present comprehensive statistical and computational methods 
for analyzing cellomic data sets and integrating them into deterministic models. We present a novel method for optimizing 
the fit of statistical distributions to heterogeneous data sets to preserve important data and exclude outliers. We compare 
methods of representing heterogeneous data and show methodology can affect model predictions up to 3.9-fold. We find 
that VEGF levels, a target for tuning angiogenesis, are more sensitive to VEGFRl cell surface levels than VEGFR2; updating 
VEGFRl levels in the tumor model gave a 64% change in free VEGF levels in the blood compartment, whereas updating 
VEGFR2 levels gave a 17% change. Furthermore, we find that subpopulations of tumor cells and tumor endothelial cells 
(tEC) expressing high levels of VEGFR (>35,000 VEGFR/cell) negate anti-VEGF treatments. We show that lowering the VEGFR 
membrane insertion rate for these subpopulations recovers the anti-angiogenic effect of anti-VEGF treatment, revealing 
new treatment targets for specific tumor cell subpopulations. This novel method of characterizing heterogeneous 
distributions shows for the first time how different representations of the same data set lead to different predictions of drug 
efficacy. 
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Introduction 

Drug resistance is one of the largest challenges in providing 
effective cancer treatment, and cellular heterogeneity is a main 
component of drug resistance [1]. Cellular heterogeneity can 
regulate systemic response, and mapping these heterogeneities is a 
grand challenge in biomedical research [2]. Heterogeneity also 
represents a challenge in the emerging field of personalized 
medicine since variability in patient populations can result in 
differential therapeutic outcomes [3] . Recent analyses of breast 
cancer cell lines and xenografts on the genetic and proteomic 
levels have offered significant insight into tumor heterogeneity 
[4,5]. Similarly, genetic screens of patient tumor samples have 
highlighted the challenge of tumor heterogeneity in both 
personalized medicine and biomarker development [6]. Under- 
standing how to characterize cellular heterogeneity wiU help 
surmount drug resistance challenges and develop more effective 
cancer treatment approaches. 

Recent efforts to characterize heterogeneity have applied optical 
biosensors [7-9]. We have recently optimized conditions for 
phycoerythrin (PE)-antibody based labeling and profiling the 



vascular endothelial growth factor receptors (VEGFR) on endo- 
thelial cells, in vitro [10] and ex vivo [11-13], with fluorescence 
calibration to commercially available PE beads [14,15]. These 
data showed significant cell-by-cell variation in surface receptor 
levels across similar cell populations. We have also developed new 
quantum dot beads for quantitative calibration of heterogeneity 
[16]. The development of these tools to characterize heterogeneity 
has resulted in a need for better data analysis and model 
incorporation methods. 

Defining standards for statistically characterizing heterogeneous 
data often requires presuppositions on the type of statistical 
distribution. In analyzing flow cytometry data, Boedigheimer and 
Ferbas automated data gating (selection). Using expectation 
maximization, they modeled the data as a mixture of Gaussian 
distributions [1 7] . Various commercial software automatically gate 
with a predefined bivariate normal or ^-distributions [18]. 
However, analyzing heterogeneous data with such presuppositions 
could neglect essential features. As such, methods for statistically 
characterizing heterogeneous data with no predisposition on the 
data distribution are needed. 
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Systems biology ofTers useful approaches for gaming both data- 
driven and in silico insight into heterogeneous biological systems. 
In particular, in silico models have applied sensitivity analysis to 
probe population outliers as a proxy for heterogeneity: Schoeberl 
et al showed that increasing the number of epidermal growth 
factor receptors (EGFR) ten-fold in the presence of external 
epidermal growth factor (EGF) resulted in approximately a 3-fold 
decrease of the amount of extracellular signal-regulated kinase 
(ERK) phosphorylated downstream the signal transduction 
pathway within one hour [19]. Saterbak et al showed that 
heterogeneity in membrane receptors-cell adherence significandy 
alters cell detachment kinetics to an applied shear force [20]. 
While these and other computational models have shown the 
importance to account for receptor heterogeneity, well defined 
methods have not been developed for integrating experimental 
heterogeneity into deterministic models. 

In this study, we present comprehensive statistical and compu- 
tational methods for analyzing cellomic data sets and integrating 
them into deterministic models. Here we show that fitting protein 
data to statistical distributions provides useful parameters for 
describing cellular populations. We show how these' \'ari()us cellular 
pojjulation repres("ntations aifc'ct the vascular endothelial growth 
factor (VEGF) distribution throughout a whole-body model, anti- 
VEGF treatment efficacy, and how these representations may affect 
anti-VEGF treatment at different time points during tumor growth. 
Additionally, we mathematically describe the necessity to target 
treatment techniques to tumor cell subpopulations. 

Materials and Methods 

Experimental receptor data 

VEGFRl, -R2, -R3, and neuropUin-l (NRPl) levels were 
quantified on human umbilical vein endothelial cells (HUVEC) 
acquired from individual donors (Lonza, Walkersville, MD and 
Stem Cell Technologies, Vancouver, Canada), as previously 
described [10]. VEGFRl and -R2 levels were quantified on 
primary mouse endothelial cells, which were freshly isolated from 
gastrocnemius and tibialis anterior of male and female 8-14 week 
old C57BL/6 (Charles River and NCI) and BALB/c (NCI) mice, 
as previously described [11]. VEGFRl and -R2 were quantified 
on tumor endothelial cells and tumor cells obtained from MDA- 
MB-231 xenograft studies, as previously described [13]. These 
MDA-MB-231 cells were kindly provided by Dr. Zaver M. 
Bhujwalla (Johns Hopkins University) with the following details 
about the cell line: MDA-MB-231 breast cancer cells are 
purchased from the American Type Culture Collection (ATCC) 
and used within 6 months of obtaining them from ATCC; the cell 
Une is tested and authenticated by ATCC by two independent 
methods; the ATCC cytochrome C oxidase I PGR assay and short 
tandem repeat profiling using multiplex PGR, this is as previously 
described [13]. No experimental data was acquired in this study. 
Ethics committee approval for data used here can be found in our 
previous publications [10,11,13]. 

Computational Models 

The Popel laboratory has developed a whole-body VEGF 
kinetic and transport model necessary for building VEGF- 
mediated angiogenesis models [21,22]. We and others recendy 
advanced this model to include experimentally quantified VEGFR 
levels on tissue, to explore the pharmacokinetics and pharmaco- 
dynamics associated with an anti-VEGF drug (Fig. 1), and how 
this drug influences systemic, tissue, and tumor VEGF levels [23]. 
This model contains three compartments, normal tissue, blood, 
and diseased tissue, to model whole-body VEGF kinetic interac- 



tions and transport between compartments, developed from 
experimental observations. The model predicts VEGF distribution 
throughout the body in response to the anti-VEGF recombinant 
humanized monoclonal antibody bevacizumab. This model 
previously showed that the anti-VEGF treatment outcome is 
dependent on the tumor microenvironment, such as receptor 
expression, and may be affected by heterogeneities across patients. 
Receptor insertion and trafficking is simplified by assuming a 
constant receptor levels on the cell surface. All receptors are 
assumed as pre-dimerized homodimers, and this model does not 
account for dimerization kinetics or heterodimers. Model compo- 
nents, assumptions, and experimental bases are described in detail 
in previous publications [21-23]. A healthy model excluding the 
tumor compartment was also developed, which includes all of the 
components in the schematic above the "Healthy Body cutoff" 
line (Fig. 1). Luminal and abluminal surfaces of endothelial cells 
are assumed to have the same receptor levels. Cellular heteroge- 
neity is explored within normal physiology and through anti- 
VEGF drug treatment response using these models. An anti- 
VEGF agent is administered in the model as an injection into the 
blood at time t = 0, and all simulations continue to 3 weeks after 
anti-VEGF injection [23]. VEGF levels in response to altering 
model parameters are measured as they directiy correlate with 
angiogenesis occurrence. 

Applying receptor values to the computational model 

Four methods are implemented to condense the data distribu- 
tion into a single VEGFR surface level: median, mode, geometric 
mean, and arithmetic mean. These four methods were chosen as 
they all have different biases towards choosing a representative 
value from th(" population. The mode* and median are biased 
towards values that are repeated frcquc'ntly, the arithmetic mean is 
biased towards the tail of a population, whereas the geometric 
mean creates a balance between the frequency and the range of 
values. VEGFR levels are updated from the extracted value, and 
free VEGF in each compartment is simulated up to 3 weeks after 
injecting an anti-VEGF drug. Updates are done for VEGFRl 
alone, VEGFR2 alone, or both using the same extraction method. 
Free VEGF using parameter updates is compared to control which 
reflects VEGFR levels previously pubhshed (1,100 VEGFRl/ceU 
and 700 VEGFR2/cell) [23]. 

Results 

Developing the low bin search method 

Optimal bin size for plotting a data set as a histogram is 
determined by fitting to three statistical distributions: Weibull, 
Gamma, and lognormal, using a range of bin sizes. These were 
chosen due to their characteristic properties: Weibull is a special 

case of the generalized extreme value distribution which approx- 
imates the maxima of a finite sequence of random variables; 
Gamma is the maximum entropy probability distribution which 
chooses the unknown distribution that exhibits the highest 
entropy; and lognormal fits a distribution whose logarithm follows 
a Gaussian. The three statistical distributions are all two 
parameter distributions, and for a given value x their probability 
density functions/(x)are as follows: 
Weibull: 

f{x;X,k) = \>^^>-> ^ (1) 
[ 0 x<0 
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Figure 1 . Schematic of the compartments and major components included in the VEGF model. The VEGF model contains a normal tissue, 
blood, and diseased tissue compartment. VEGF interacts anti-VEGF, GAG chains, and with its receptors VEGFR1/2, on tumor cells and Myocytes, as 
well as NRP1/2 on endothelial cells. The luminal surface of the endothelial cells resides in the blood compartment, whereas the abluminal side resides 
in the normal tissue compartment on healthy endothelial cells and the diseased tissue compartment on tECs. VEGF is secreted by the tumor in the 
diseased compartment and tissue in the normal tissue compartment. VEGF is also secreted and lymphatically drained to the blood compartment 
where it is cleared. VEGF is also permeable through the blood vessel wall. A thorough description of the model containing a complete list of species 
and parameters can be found in Finley ef a\ [23]. The tumor model contains all of the shown components, while the healthy model contains only 
those above the "healthy model cutoff" line. 
doi:1 0.1 371/journal.pone.0097271 .gOOl 
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where k is the shape parameter, X is the scale parameter, r(^)is 
the gamma function evaluated at A;,^is the mean, andffis the 
standard deviation. The two parameters of each distribution that 
best describe the experimental data are determined using 
MATLAB. Next it is determined which of the three statistical 
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distributions best represents the data by minimizing the sum of 
squared errors (SSE). The SSE for all three distributions is 
calculated by 

SSE=Y,(yi-Yd'' (4) 

1=1 

where n is the total number of bins in the histogram, Yi is the 
number of elements in the bin, and y, is the statistical 
distribution evaluated at x, , the center of the i"' bin. For example, 
for a bin that ranges between 10 and 20 surface receptors,j'iis the 
value given by the statistical distribution evaluated at 15 surface 
receptors. The histogram is the experimentally obtained receptor 
data, and thus y is the true value while yt is the approximated 
value from fitting the statistical distribution. The SSE was used to 
make two decisions: (1) Decide which statistical distribution fit the 
data best. The best fitting distribution had the lowest SSE amongst 
the three distributions given the same number of bins. However, 
the best fitting distribution is not dependent on the number of bins. 
Next, (2) decide the optimal number of bins to define the 
histogram. The optimal bin number was determined by minimiz- 
ing the SSE the best fit distribution gave across a range of 
histograms with different numbers of bins. Thus, if the Gamma 
distribution gave the best fit, the optimal bin numberSwould be 
determined as 

5=imn(S5'£f'™™')Vfe{5,6,...,w} (5) 

where SSEf™""'\i the SSE given by the Gamma distribution fit 
using a number of bins /, where /ranges from 1 to m. Note that m 
is defined as the maximum number of bins desired to test, and we 
used m = 500 bins. Also note that (' starts at 5 bins as starting with a 
smaller number of bins is not practical for representing the data. 

The histogram is using the optimal bin number is then made. 
Each bin of index j in the histogram is centered at fly, has a width 
w, and contains a number of cells bj. Physically, Uj is the median 
surface receptor level in bin j and bj is the number of cells in the 
population whose surface receptor level falls in the range of the bin 
[a/ — J ,aj + j) . Note that all bins have the same width wand that 
aj and w are defined automatically once the optimal bin number is 
specified. The number of cells bj in a bin is determined by 

Z>,= f^c<Vre{(a,- I) <r< («;+ (6) 

i = l 

where Cis the total number of cells in the data set and ris surface 
receptor level the index / cell contains. We define that if ris within 
the bin range ce//[ = 1 , otherwise ce//[ = 0. For example, a bin with 
fly = 7,11' = 4, and bj = 3 mc-ans 3 cells in the population had 
between 5 and 9 surface receptors. 

After defining the optimal bin number, we define outiiers to 
account for unwanted data obtained from flow cytometry, such as 
surface receptor levels outside the physiological range. As the data 
is heterogeneous, we define a cutoff method for specifying outliers 
that utilizes the data shape and the optimal bin number. The 
cutoff is determined by the bin with lowest aj that also meets the 
following two criteria: 

(1) The number of cells in the bin is less than 1% the number of 
cells in the largest bin. For example, if the largest bin has 500 
cells, the cutoff bin must have less than 5 cells. 



(2) The neighboring bins have a number of cells less than 1 % the 
number of cells in the largest bin. 

The first criterion ensures the cutoff bin has low occurrence 
probability, while the second criterion ensures uniqueness. 
Mathematically, the cutoff bin is chosen when the following 
conditions are met: 



Cutoff = 

(7) 

{min (a,) 1 < 0.01 hj < OM-b„„ hj+i< Qm-b„, } 

where bj is the number of cells contained in the bin of index j, bj-i 
and bj+i are the number of cells contained in the neighboring 
bins, and b„, is the number of cells in the largest bin. Once the 
cutoff bin is determined, data within that bin and all bins to the 
right are removed. The cutoff point is d(-fincd using the histogram 
with the optimal bin number. The optimal binning minimizes the 
SSE that the best fit distribution gives across a range of histograms 
with different numbers of bins. Thus, optimal binning ensures 
outliers are best defined using the best data representation. 
Henceforth, we refer to this method as the "low bin search" 
method. Note that the geometric mean, arithmetic mean, mode, 
and median are taken from the complete data set, minus outiiers, 
not from the binned data. For this study, we were only interested 
in defining right hand side outliers, and only bins to the right of the 
largest bin are cutoff candidates. This is due to there being no 
negative data, in addition to cells that either have no or low 
expression levels. Pr(;\ ious studies have shown that low VEGFR 
expression does occur in tip-stalk cell selection, where activation of 
Notchl receptors downregulates VEGFRs [24,25]. Thus, we did 
not consider left hand outliers, as our data is consistent with these 
experimental findings. In summary, low bin search (1) determines 
the best fitting distribution; (2) finds the optimal bin number using 
the best fit distribution; (3) defines a cutoff point using the optimal 
bin number. 

Robustness of low bin search 

Two bootstrapping methods were performed using the 
VEGFRl data obtained from C57BL/6 mice to observe the 
robustness of low bin search. First, low bin search was performed 
on a random data sample to determine the cutoff value (Fig. 2A) 
and the geometric mean (Fig. 2B) after defining outliers. Second, 
randomly selected samples within the complete data set were 
increased by 20% and low bin search determined the cutoff value 
(Fig. 2C) and geometric mean (Fig. 2D). Sample sizes used were 
1,000, 5,000, and 10,000, and 100 trials were run for each sample 
size and bootstrapping pair. The cutoff values found from the 
random sampling fall within a range of 10,000 receptors/cell, 
whereas the geometric means fall within a range of 130 receptors/ 
cell. The cutoff values found after increasing values by 20% fall 
within a range of 2,500 receptors/cell, whereas the geometric 
means fall within a range of 20 receptors/ cell. The smallest and 
largest geometric means for each sampling size and bootstrapping 
pair were used as parameters in the model to determine if 
bootstrapping caused differences in predicted free VEGF levels 
(Table 1). The largest difference in free VEGF in the diseased 
tissue compartment, tximpared to the complete data set, from aU 
these tests deviated by 5.'24:"/o. This deviation is negligible, and low 
bin search is therefore considered robust for our purposes. 
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Defining outliers in the HUVEC data 

We developed and tested a methodology for defining outlying 
data by examining cell-by-cell VEGFR3 data treated with 1 nM 
VEGF-A155 (Fig. 3). These data naturally exhibit a heavy taU, 
which prevents fitting to WeibuU and Gamma distributions 
(Fig. 3A). After removing outliers using low bin search, described 
in the Materials and Methods and a cutoff of 2%, which gave the 
cutoff at 22,000 receptors/cell, the tail was reduced and the 
Weibull and Gamma distributions better fit the data (Fig. 3B). A 
common data cut-off approach of defining outliers as three 
standard deviations above the mean [26] also resulted in a worse 
fit than that given by low bin search to Weibull and Gamma 
distributions (Fig. 3G). We found that the minimum SSE for the 
Gamma distribution with no outfiers removed was approximately 
95 times larger than the minimum SSE after implementing low bin 
search. Removing three standard deviations above the mean gave 
an approximately 5 times larger minimum SSE than that using 
low bin search. Likewise for the Weibull distribution, removing no 
outliers gave an approximately 64 times larger, and removing 3 
standard deviations above the mean gave an approximately 4 
times larger minimum SSE than that using low bin search. 
Comparing the number of data points defined as outliers between 
low bin search and three standard deviations shows that low bin 
search removed 7.59% more raw data, at most, over all the data 
sets (Table SI). 

HUVEC surface receptor data 

The VEGFRl, -2, -3, and NRPl surface expression on 
HUVECs represent distributions that are not normally distributed 



(Fig. 4). As such, this data is not represented best by the arithmetic 
mean in computational modeling. Therefore, we examined fittings 
to Gamma, Weibull, and lognormal distributions (Fig. 4). Impor- 
tant features revealed by these fittings include the following: 
VEGFRl, -2, and -3 distributions are positively skewed and best fit 
to the lognormal distribution (Fig. 4A-C); whereas NRPl is best fit 
to the Gamma distribution (Fig. 4D). For comparison, we fit the 
receptor data to the Gaussian distribution (Fig. 4). For VEGFRl, 
the minimum SSE given by the Gaussian fit was approximately 
1,700 times larger than the minimum SSE given by the lognormal 
fit. Likewise for VEGFR2, the Gaussian SSE was approximately 
440 times larger than the lognormal SSE, approximately 1 30 times 
larger for VEGFR3, and approximately 45 times larger than die 
Gamma distribution fit to NRPl. NRPl levels are an order of 
magnitude higher than any VEGFR level (Table 2). VEGFRl/2 
surface expression on C57BL/6 and BALB/c mice were also fit to 
Gamma, WeibuU, and lognormal distributions (Fig. SI, Table 2). 
For several distributions, such as VEGFR2 on HUVECs, the 
mode is larger than the other three representative parameters. 
This is because these parameters were taken from the complete 
data set and not the binned data; thus, the mode may not 
necessarily be contained in the largest bin. Other than these cases, 
the arithmetic mean is the largest parameter as it has the most bias 
towards the heavy taU of distributions. Since the geometric mean 
accounts for the taU without biasing it, we choose it as the best 
representing parameter. For comparing these parameters, we wlU 
compare the geometric mean to the mode as those have the largest 
discrepancy in value. 
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Figure 2. Bootstrapping low bin search. The cutoff value and geometric mean determined from low bin search (A-B) applied to a random subset 
(A-B) and after adding 20% error to random data (C-D) using the VEGFRl data set from C57BL/6 mice. Sample sizes are 1,000, 5,000, and 10,000. Each 
sample size and bootstrapping method pair contains data from 100 trials, however, due to point clustering, the plots may appear to contain less 
trials. Red lines show the cutoff (15,642 rec/cell) and geometric mean (1,053 rec/cell) given by low bin search using the complete data set. 
doi:1 0.1 371/journal.pone.0097271 .g002 
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VEGFR1 affects both initial and quasi-steady state free 
VEGF concentrations 

VEGFRl and VEGFR2 levels on healthy endothelial cells were 
updated within the tumor model to the levels found on endothehal 
cells of C57BL/6 mice. Changes were made sequentially and in 
combination to observe the sensitivity of the drug treatment to 
each specific receptor (Fig. 5). VEGFRl is a greater effector of free 
VEGF levels in each compartment: we observe a 40% decrease in 
initial free VEGF levels, before anti-VEGF treatment, in the 
normal tissue compartment. This occurs when VEGFR 1 levels on 
healthy endothelial cells are updated from 1,100 VEGFRl /cell 
(550 abluminal VEGFRl /ceU and 550 luminal VEGFRl /cell), 
which was used in the previous model [23] and represented 
ensemble averaged VEGFRl levels on both C57BL/6 and BALB/ 
c endothelial cells [11], to 2,110 VEGFRl/cell (1,055 abluminal 
VEGFRl/ceU and 1,055 luminal VEGFRl/cell), which repre- 
sents the C57BL/6 geometric mean (Table 2). VEGFRl -mediated 
decrease in initial free VEGF concentrations in the normal tissue 
compartment gave a 40% decrease relative to control, compared 
to only a 5% decrease when updating VEGFR2 alone. When both 
VEGFRl and VEGFR2 are updated on the healthy endothehal 
cells, we observe a 42% decrease in the initial free VEGF levels, 
further confirming that VEGFRl has a greater effect on free 
VEGF levels (Fig. 5A). We also observe a 1.7-fold decrease in 
initial free VEGF levels relative to control. These updates to the 
normal tissue compartment also affected the blood compartment, 
where free VEGF levels were decreased by 64% when updating 
VEGFRl alone, decreased by 17% when updating VEGFR2 
alone, and decreased by 68% when both receptors are updated 
(Fig. 5B). Updates to either or both VEGFRl and VEGFR2 had 
no noticeable effect on initial free VEGF in the diseased tissue 
compartment (Fig. 5C). Free VEGF reached quasi-steady state 3 
weeks after anti-VEGF administration. In the normal tissue 
compartment, this quasi-steady state level is decreased from 
control levels by 22% when updating VEGFRl alone. The quasi- 
steady state free VEGF levels only exhibited a modest decrease of 
4% when updating VEGFR2 alone, and a total decrease of 25% 
when both receptors are updated (Fig. 5A). In the blood 
compartment, this quasi-steady state level is decreased from 
control levels by 28% when updating VEGFRl alone, decreased 
only 9% when updating VEGFR2 alone, and had a total decrease 
of 35% when both receptors are updated (Fig. 5B). In the diseased 
tissue compartment, this quasi-steady state level is decreased from 
control levels by 25% when updating VEGFRl alone, decreased 
only 5 % when updating VEGFR2 alone, and had a total decrease 
of 28% when both receptors are updated (Fig. 5C). 

Representation of VEGFR levels effects model predictions 

Since slight changes in VEGFR levels cause varying effects on 
the initial and quasi-steady state free VEGF concentrations, we 
next survey the data representation approaches with regards to 
their predicted anti-VEGF drug treatment efficacy. This was 
achieved by examining how updating both VEGFRl and 
VEGFR2 controls the fold change in free VEGF levels after 

. TrT^z-iT--- • • 1 • T If t^^^^]/ — quasi— Steady State 

anti-VEGF injection, which we deiine as: — . 

[VEGF\,^Q 

An increase in free VEGF levels after anti-VEGF injection is 
defined as a positive change, while a decrease is defmed as a 
negative change. We observe a larger magnitude change in free 
VEGF in the blood and normal tissue compartments relative to 
control when the model is updated to reflect the VEGFRl and 
VEGFR2 mode, median, geometric mean, and arithmetic mean 
on healthy endothelial cells from C57BL/6 (Fig. 6A) and BALB/c 
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Figure 3. Effectiveness of low bin search. Effect of low bin search on distribution fitting using HUVEC data of VEGFR3 treated with VEGF-A. (A) 
Weibull and Gamma distributions were unable to fit the raw data. (B) After implementation of the cutoff method, Weibull and Gamma distributions 
were able to fit the data. (C) Comparison to removing all data 3 standard deviations above the mean, which is also unable to properly fit the data. 
Goodness of fit was measured by the combined sum of squared error of each statistical distribution. 
doi:1 0.1 371 /journal.pone.0097271 .g003 



(Fig. 6B) mice. AH receptor levels give a negative free VEGF fold 
change in the diseased tissue compartment, which indicate 
decreased VEGF levels. 



We observe that the choice of data representation alters the 
predicted change in free VEGF concentrations, with the largest 
difference in predicted treatment efficacy given between the mode 
and arithmetic mean. We predict a 3.1 fold change in VEGF 
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Figure 4. Statistical distribution fits to in vitro receptor populations. Cell-by-cell analysis of (A) VEGFR1, (B) VEGFR2, (C) VEGFR3, and (D) NRP1 
distributions on in vitro human endothelial cells. Each distribution was fit to Weibull (generalized extreme value distribution), Gamma (maximum 
entropy probability distribution), and lognormal (logarithm is normally distributed) probability density functions. The parameters for the best fit 
distributions are given in Table 2. 
doi:1 0.1 371/journal.pone.0097271 .g004 
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3 .S I BALB/c data give similar results, predicting a 3.9 fold change in 

S .ii o VEGF concentration in the blood compartment when updating 

g .s the arithmetic mean into the model (3,850 VEGFRl/ceU and 

I I E 2,690 VEGFR2/cell) (Table 2); whereas, we predict a 1.6 fold 

change in the blood compartment with the mode (1,700 
VEGFRl/ceU and 1,200 VEGFR2/cell) and a 2.7 fold change 
with the geometric mean (2,700 VEGFRl/ceU and 1,900 
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"' o £ best representation of the receptor levels, and thus the most 
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g 7-1 o The normal tissue and blood compartments give lower magnitude 

free VEGF levels compared to control for aU simulations 
performed (Fig. 7). Initial free VEGF levels, relative to normalized 
control levels, using in vitro HUVEC data is notably different from 
the ex vivo C57BL/6 mouse data in the normal tissue 
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^ <S S I ;5 ceU data. When subpopulations are quahtatively observed, fitting 

to mixtures is a common approach [27,28]. We have previously 



VEGF concentrations versus time are also given (Fig. S2). Having 
added ex vivo VEGFRs to the normal tissue and in vitro and ex 
vivo VEGFRs in a tumor-free model, we compare adding in vitro 
and ex vivo VEGFRs to the tumor model. Free VEGF levels 
obtained using ex vivo versus in vitro VEGFRl in the tumor 
model differ by 54% at most with the mode and by 25% at most 
with the geometric mean (Fig. S3). 



Tumor growth increases free VEGF levels 

To observe how VEGF levels may change with tumor 
progression, we simulate 0.62 cm'' and 1.45 cm'' tumor volumes, 
S ^ corresponding to tumors that have been growing for 3 weeks and 6 

S " tii o weeks, respectively. The tumor size and VEGFRl /2 surface levels 



_ ^ o £ on tumor cells and tumor endothelial cells (tEC) were obtained 

S ^ o K from mouse xenografts [13]. It is observed that receptor 

iS "I 8 distributions on the tumor ceUs and tEC exhibited multiple 
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2 g; I £ to these subpopulations, these data cannot be fit by the continuous 
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Figure 5. Effect of VEGFR1 and VEGFR2 levels on anti-VEGF efficacy. Comparison of updating the model by adding experimental VEGFR1 
levels only, VEGFR2 levels only, and both compared to the control. Updated VEGFR1 and VEGFR2 values represent geometric means of C57BL/6 
distributions (2,100 VEGFRI/cell and 1,540 VEGFR2/cell). The control reflects previously published VEGFR1 and VEGFR2 levels (1,100 VEGFRI/cell and 
700 VEGFR2/cell) [23]. Free VEGF concentration is shown in (A) the normal tissue compartment, (B) the blood compartment, and (C) the diseased 
tissue compartment. An optimized anti-VEGF agent is added at t = 0 and the VEGF concentration response is simulated to 3 weeks after injection. 
doi:10.1371/journal.pone.0097271.g005 



modeled these populations as mixtures and shown that they fit well 
to the multi-component lognormal mixture model [13]. However, 
prior computational models have represented cellular data using 



the arithmetic average. Thus to arrive at a more accurate 
quantitative tumor cell and tEC data representation, we examine 
tri-modal Gaussian mixture models here. While non-Gaussian 
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Figure 6. Effect of receptor levels on free VEGF fold change after anti-VEGF treatment. Fold change in free VEGF levels in response to anti- 
VEGF treatment using different representative receptor levels from the (A) C57BL/6 (C57) and (B) BALB/c (BAL) mice data. VEGFR1 and VEGFR2 levels 
were both updated in the model with geometric mean (2,100 VEGFR1/C57, 1,540 VEGFR2/C57, 2,700 VEGFR1/BAL, 1,900 VEGFR2/BAL), arithmetic 
mean (2,970 VEGFR1/C57, 2,180 VEGFR2/C57, 3,850 VEGFR1/BAL, 2,690 VEGFR2/BAL), mode (1,820 VEGFR1/C57, 2,860 VEGFR2/C57, 1,700 VEGFR1/ 
BAL, 1,200 VEGFR2/BAL), and median (2,050 VEGFR1/C57, 1,510 VEGFR2/C57, 2,650 VEGFR1/BAL, 1,800 VEGFR2/BAL). Fold change is relative to 
control fold change, reflecting the normalized fold change obtained using previously published receptor levels (1,100 VEGFRI/cell and 700 VEGFR2/ 
cell) [23]. 

doi:1 0.1 371/journal.pone.0097271 .g006 
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Figure 7. Receptor level effect on steady state free VEGF levels in the healthy body model. Free VEGF at steady state in the healthy body 
model by updating VEGFR1 and VEGFR2 levels from the (A) HUVEC, (B) C57BL/6 (C57), and (C) BALB/c (BAL) distributions. VEGFR1 and VEGFR2 levels 
were both updated in the model with geometric mean (2,100 VEGFR1/C57, 1,540 VEGFR2/C57, 2,700 VEGFR1/BAL, 1,900 VEGFR2/BAL, 2,530 VEGFR1/ 
HUVEC, 5,260 VEGFR2/HUVEC), arithmetic mean (2,970 VEGFR1 /C57, 2,1 80 VEGFR2/C57, 3,850 VEGFR1 /BAL, 2,690 VEGFR2/BAL, 3,000 VEGFR1 /HUVEC, 
6,950 VEGFR2/HUVEC), mode (1,820 VEGFR1/C57, 2,860 VEGFR2/C57, 1,700 VEGFR1/BAL, 1,200 VEGFR2/BAL, 2,720 VEGFR1/HUVEC, 11,400 VEGFR2/ 
HUVEC), and median (2,050 VEGFR1/C57, 1,510 VEGFR2/C57, 2,650 VEGFR1/BAL, 1,800 VEGFR2/BAL, 2,500 VEGFR1/HUVEC, 5,350 VEGFR2/HUVEC). The 
control reflects previously published VEGFR1 and VEGFR2 levels (1,100 VEGFRI/cell and 700 VEGFR2/cell) [23]. 
doi:10.1371/journal.pone.0097271.g007 



mixture models, such as the Weibull mixture model [29], have 
more flexibility in shape than the Gaussian mixture model, due to 
both the goodness of fit and simplicity in computational resources, 
we believe the Gaussian mixture model is the best model choice, 
and deem it unnecessary to perform additional fitting to other 
mixture models. Simulations are performed (1) using the tri-modal 
mixture distribution and (2) individually inserting each component 
(mean) comprising the tri-modal mixture. Receptor levels within 
the mbcture model are determined by weighting the individual 
Gaussian distribution parameters with their respective densities 
and summing. The distribution and mixture fits to receptor levels 
on tumor cells and tECs grown for 3 weeks and 6 weeks are shown 
(Fig. 8—9). The densities are ordered with "Density 1" having the 
highest cell frequency and "Density 3" having the lowest 
frequency. As the tumor grows from 3 weeks to 6 weeks, VEGFR 1 
subpopulations on the tumor cells are more highly expressed 
(Fig. 8A, 9A), but VEGFR2 levels only slightly change (Fig. 8B, 
9B). For example, "Density 1" gives the geometric mean at 2,900 
VEGFR2/tumor ceU (Table 3) at week 3 (Fig. 8B), and "Density 
1" gives the geometric mean at 3,150 VEGFRl /tumor cell 
(Table 3) at week 6 (Fig. 9B). VEGFRl /2 levels on tECs decrease 
throughout tumor growth. For example, "Density 1" gives the 
geometric mean at 13,000 VEGFRl /tEC (Table 3) at week 3 
(Fig. 8C), and "Density 1" gives the geometric mean at 600 
VEGFRl /tEC (Table 3) at week 6 (Fig. 9C). Likewise, "Density 
1" g-ives the geometric mean at 1,450 VEGFR2/tEC (Table 3) at 
week 3 (Fig. 8D), and "Density 1" gives the geometric mean at 600 



VEGFR2/tEC (Table 3) at week 6 (Fig. 9D). Gaussian distribu- 
tions corresponding to receptor subpopulations express large 
differences in receptor levels (Table 3). 

At tumors grown to 3 weeks, there are no noticeable changes in 
free VEGF levels in either the normal or the blood compartments 
when only the tEC receptor levels are updated, for all receptor 
distributions (Fig. lOA-B, lOD-E, lOG-H). However, initial free 
VEGF levels before anti-VEGF administration in the diseased 
tissue compartment is highly dependent on the VEGFRl levels on 
tECs (Fig. IOC, lOF, 101). Free VEGF levels in the diseased tissue 
converge to approximately 41 pM after anti-VEGF treatment. 

Interestingly, using the VEGFRl geometric mean on the 
"Density 2" distribution (Fig. 8C), which is 69,500 VEGFRl /cell 
(Table 3), gives an increase in free VEGF levels in the diseased 
tissue after anti-VEGF treatment, corresponding to a 1.64 fold 
change in free VEGF (Fig. IOC). Sensitivity analysis, performed by 
observing free VEGF levels in response to parameter perturba- 
tions, reveals that decreasing the VEGFRl membrane insertion 
rate on the tECs transforms anti-VEGF treatment from a pro- 
angiogenic response to anti-angiogenic (Fig. 1 1). The insertion rate 
defines the rate at which receptors are inserted into the cell 
membrane, such as through receptor trafiicking, making those 
receptors available for ligand binding. Sensitivity analysis also 
reveals that anti-VEGF treatment was ineffective for subpopula- 
tions containing more than 35,000 VEGFRI/cell (data not 
shown). 
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Figure 8. Gaussian mixture model of tECs and tumor ceils at 3 weelcs of tumor growth. Gaussian tri-modal mixture models and the 
individual Gaussian distributions making up the mixture model for (A) VEGFR1 on tumor cells, (B) VEGFR2 on tumor cells, (C) VEGFR1 on tECs, and (D) 
VEGFR2 on tECs at 3 weeks of tumor growth. "Density 1" corresponds to the Gaussian with highest weight in the mixture model, "Density 2" is the 
second highest weight, and "Density 3" is the lowest weight. 
doi:1 0.1 371/journal.pone.0097271 .g008 



As with tumors grown to 3 weeks, tumor grown to 6 weeks show 
no noticeable change in free VEGF levels in the normal and blood 
compartments when only the tEC receptor levels were updated 
(data not shown). Initial free VEGF levels in the diseased tissue 
compartment, before adding anti-VEGF, are dependent on tEC 
receptor levels, but converge to approximately 41 pM after 
administering anti-VEGF (Fig. 12A-C). Free VEGF levels in the 
diseased tissue compartment before anti-VEGF treatment is 
higher in the 6 week tumor than in the 3 week tumor. For 
example, using the geometric mean and updating both tEC 
receptors, the mixture model of the 6 week tumor gave a free 
VEGF concentration of 93.77 pM (Fig. 12C) while the 3 week 
tumor gave 49.24 pM (Fig. 101). The tumor cell receptors are 
updated in a similar fashion for 3 week (Fig. S4) and 6 week 
tumors (Fig. S5). Updating VEGFRl/2 levels on the tumor cells 
either individually or simultaneously gave no noticeable change to 
the free VEGF levels in the normal tissue and blood compartments 
for all receptor distributions, but free VEGF levels are highly 
sensitive to the tumor cell receptor levels in the diseased tissue (Fig. 
S4, S5). 

Remark. The receptor levels and tumor sizes are obtained 
using mouse xenograft models: human tumor inoculated in mouse; 
therefore, the tumor cells are human and the tumor endothelial 
cells are murine. However, all values (mouse and human) are used 
in simulating an adult human. This simplification is acceptable as 
we are only interested in observing trends in the free VEGF levels 
in response to receptor heterogeneity. These observed trends are 
stiU physiologically relevant for humans as we assume the tumor 
size ratio to receptor levels are the same for adult humans. 



Discussion 

In this study, we introduce a novel method for quantitatively 
representing heterogeneous populations, and show how account- 
ing for heterogeneity affects drug efiicacy predictions. We show 
how low bin search simultaneously define outiiers and optimizes 
heterogeneous data binning. This is accomplished by performing 
goodness of fit tests to various statistical distributions, where the 
best fitting statistical distribution is determined. The best fitting 
statistical distribution is then used to define outliers and then 
representative parameters from the data are extracted. We used 
these parameters in a tumor angiogenesis model, where we predict 
that free VEGF levels are sensitive to VEGFRl. Using the 
simplified healthy body model, we also observed that free VEGF 
levels are sensitive to VEGFR 1 expression, indicating that cellular 
heterogeneity is essential in both healthy and cancerous angio- 
genesis models. We also predict that anti-VEGF drug efficacy is 
sensitive to subpopulations present in tumor cells and tECs. 

Defining outliers in heterogeneous populations 

We introduced low bin search, a novel method to define outhers 
in a heterogeneous distribution that follows the algorithm: (1) 
determining the best fitting distribution with the minimum SSE; 
(2) finding the optimal bin number by minimizing the SSE the best 
fit distribution gives over a range of histograms; (3) defining the 
cutoff point from the optimal bin number. The cutoff point 
eliminates the first bin that meets two criteria: (1) the number of 
cells in the cutoff bin is less than 1 % the number of cells in die 
largest bin; (2) the neighboring bins also have a number of cells less 
than 1% the number of cells in the largest bin. The theoretical 
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Figure 9. Gaussian mixture model of tECs and tumor ceils at 6 weelcs of tumor growtKi. Gaussian tri-modal mixture models and the 
individual Gaussian distributions making up the mixture model for (A) VEGFR1 on tumor cells, (B) VEGFR2 on tumor cells, (C) VEGFR1 on tECs, and (D) 
VEGFR2 on tECs at 6 weeks of tumor growth. "Density 1" corresponds to the Gaussian with highest weight in the mixture model, "Density 2" is the 
second highest weight, and "Density 3" is the lowest weight. 
doi:1 0.1 371 /journal.pone.0097271 .g009 



basis is as follows: the first criterion ensures the cutofl!' bin has low 
occurrence probability, while the second criterion ensures 
uniqueness. Thus, as low bin search is an algorithmic approach 
for defining outliers with a theoretical basis for defining the cutoff 
point, we define it as a method as opposed to a heuristic rule, 
which pertains to trial-and-error methods of problem solving when 
an algorithmic approach is infeasible or a theoretical basis is 
lacking. This novel method represents an important development, 
since outlier detection is essential for removing unwanted data or 
finding features that occur at low probability. Outlier detection has 
been tackled for mass spectrometry by calculating the Mahalano- 
bis distance a distribution gives to all other experimental runs and 
eliminating those with suspiciously high distances [30]. Another 
algorithm uses projection and quantUe regression to discard values 
that do not follow the general trend given by the data set [31], but 
is computationally intensive for many data points. These methods 
are useful for removing experimental runs that exhibit high 
variation, but not for defining oudiers within a data set. Outliers in 
gene expression can be defined by incorporating probability 
distribution functions and comparing to multiple experimental 
runs [32]. However, this method for gene expression requires a 
priori knowledge for population distribution functions, which may 
not be available. The low bin search method we present requires 
no a priori knowledge in regards to what statistical distribution 
represents the data, and wiU define outliers within any type of 
distribution. 

Automatic gating and outlier removal in flow cytometry 

Flow cytometry is a high-throughput tool for cell-by-cell 
analysis, and requires gating to separate cell populations. 



Bashashati and Brinkman have emphasized the need to completely 
automate flow cytometry data analysis, and provided a theoretical 
framework for this analysis [33]. However, few methods for 
automatically gating flow cytometry data and defining outliers 
within gated data exist. One such method for automated gating is 
cluster analysis, which groups cells with similar fluorescence 
patterns, such as by K-means clustering [34,35]. Density based 
merging places the flow cytometry data on a 2-dimensional grid, 
and groups data based on grid point densities [36]. K-means 
cluster is limited in that clusters must be convex, and thus neglect 
subpopulation features exhibited by any other shape. Density 
based merging is limited to gating 2-dimensional data, such as data 
containing forward scatter and side scatter information only, and 
thus cannot gate data with more than two features. There is a need 
to expand on current methods to gate high dimensional data and 
better capture subpopulations. Thus, due to current automated 
gating limitations, gating typically involves manually selecting cell 
populations for these data as we have done in the past [10-13]. 
The low bin search method we developed could be implemented 
in addition to automatic gating to search through and remove 
oudiers within a gate. This would provide less manual pre- 
processing of flow cytometry data, which is time consuming and 
subject to human error and reproducibility issues. 

Binning optimization 

Low bin search presented here bins data based on the data 
shape and does not require transforming statistical distributions. 
Binning is important as it reduces data dimensionality and 
organizes data based on similar features. Mathematic operations 
performed on binned data, such as fitting statistical distributions. 
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are affected by the binning method. Thus, it is important to 
optimize the number of bins sucli that statistical distributions can 
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>^ for n data points, respectively. To improve these current 

■£ S approaches, it is necessary to develop methods that bin based on 

^ ^ specific data features, in order to best represent each data set 

g S individually. Thus, we created low bin search to bin and represent 

S- ° data sets that optimizes the fit to the best fitting statistical 

o S distribution. For comparison, low bin search determined that 

^~ ^ NRPl levels on HUVECs are optimally distributed over 101 bins, 

. - S whereas MATLAB would use 1 72 bins and Palisade would use 44 

bins. Using these bin numbers, the total error was calculated as the 
sum of the individual SSE from the Weibull, Gamma, and 
P ^ Z3 lognormal fits. For this example, MATLAB gave a total error 

24.71% larger than that given by low bin search, while Palisade 
E 5 gave a total error 77.41% larger than low bin search. Probability 

Binning is another binning method, which splits up the data such 
that every bin may not have the same width, but contains an equal 
number of events [37]. However, this method shapes the 
histogram into a uniform distribution, and requires transforming 
statistical probability distributions for goodness of fit tests. Low bin 
3 * search not only effectively defines outliers, but also optimizes data 

binning. 
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Determining the best fitting statistical distributions 

We show that it is necessary to determine the goodness of fit a 



<U ^ 

^ cfl 

g < statistical distribution gives to a heterogeneous data set to find the 



best fitting distribution. There are many metrics for testing the 
goodness of fit between observed data and statistical distributions. 
We chose SSE as the test metric as it weights every bin equally. By 
changing the number of bins, the cluster mean changes without 
altering the best fitting distribution, allowing the SSE using 
different bin numbers to be compared. We optimize the bin 
number with the minimum SSE, and the oudiers are then defined. 
Thus, using SSE to measure goodness of fits not only determines 
the best fit distribution, but simultaneously defines outliers. For 
binned data, SSE compares the observed hits in a bin to that 
expected from a statistical distribution, using the bin center. 
^ ^ Another test metric, Chi-squared, differs as it weights each bin by 

the number of hits it contains [38]. Overfitting is a significant 
problem that negates predictive power, and results in bad data 
fitting. Overfitting occurs when a statistical model is overly 
^ o g. complex to explain the data, typically describing random noise or 

^ ^ errors instead. Overfitting generally occurs when there are a 

^ -i^ uj relatively high number of fit parameters to the number of 

E S "c observations. To prevent overfitting, low bin search performs the 

c N = following: (1) finds the best fitting distribution; for example, in the 

c g case of VEGFR3 this best fitting distribution is the lognormal 

I S E ^ distribution (Fig. 4C); (2) optimizes the number of bins by 

°^ ^ ^ minimizing the SSE given by the best fitting distribution (Eqn 5); 

£ 5 -g o (3) determines the cutoff point (Eqn 7). The optimal bin number is 

I "I E ^ found by minimizing the SSE the best fitting distribution gives, as 

c S~ 3 ^ opposed to minimizing the SEE globally for all distributions, 

£ -D I £ reducing the overfitting risk. Additionally, all data sets contain 

£ _2 :2, greater than 20,000 observations, but rather than attempting to fit 

£ 5 c ^ all observations, we fit to data represented by the optimal bin 

^1 5 number. This further reduces fitting parameters and the overfitting 

^ _£ ^ risk. Further developing low bin search could include comparing 

goodness of fit tests to determine the most robust test metric. 
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Figure 10. Effect of tEC receptor levels on anti-VEGF treatment at 3 weeks of tumor growth. Free VEGF in the normal tissue, blood, and 
diseased tissue compartments in response to anti-VEGF treatment after updating (A-C) VEGFR1 alone, (D-F) VEGFR2 alone, and (G-l) both receptors on 
the tECs at 3 weeks of tumor growth. "Density 1" (Dl) corresponds to the Gaussian with highest weight in the mixture model, whereas "Density 2" 
(D2) is the second highest weight and "Density 3" (D3) is the lowest weight. "IVlixture" was obtained by summing the geometric means of each 
density distribution weighted by their density in the mixture model. The geometric mean was used for all receptor distributions (18,550 VEGFR1/ 
IVlixture, 1,950 VEGFR2/Mixture, 13,000 VEGFR1/D1, 1,450 VEGFR2/D1, 69,500 VEGFR1/D2, 1,100 VEGFR2/D2, 1,200 VEGFR1/D3, 10,900 VEGFR2/D3). 
The control reflects previously published VEGFR1 and VEGFR2 levels (1,100 VEGFRI/tEC and 700 VEGFR2/tEC) [23]. 
doi:10.1371/journal.pone.0097271.g010 



Statistical distributions best fitting data sets 

We have found that VEGFRs best fit to logiiormal distributions 
under all experimental conditions, whereas NRPl best fit to the 
Gamma distribution when untreated and to the Weibull distribu- 
tion when treated with VEGF-A or VEGF-C. We hypothesize 
VEGFRs best fit lognormal distributions because they have heavy 
tails not present in NRPl distributions. The heavy tail shows that 
some cells in the population express receptors at a higher level, 
which may be used by the cell population to eficit a specific 
response. Lacking a heavy tail in the NRPl distributions indicates 
that the cell population keeps NRPl levels relatively consistent 
across the population. Previous studies have shown that NRPl 
expression is important for T-cell stability and survival, and may 
have a similar role in endothelial cells [39]. Additionally, NRPl is 
a co-receptor for VEGF-Aigs, and appears to play a vital role in 
vascular morphogenesis [40] . However, it is not known why NRPl 
is expressed at higher levels than VEGFRs. Additionally, better 
understanding the connection between the best fitting statistical 
distribution and the population role is needed. 



Quantifying a heterogeneous population 

Due to the VEGFRs being best fit to lognormal distributions, 
we chose the geometric mean as the best heterogeneous data 
representation. This is because the geometric mean accounts for 
the heavy tail, unlike the mode or median, but it is not biased 
towards the tail like the arithmetic mean. We do not want to bias 
the heavy tail as it only represents a small subset of the cells in the 
population. For example, VEGFRl on tECs at 3 weeks had a 
heavy tail that accounted for approximately 3% the total cells in 
the population. The three statistical distributions were chosen due 
to their characteristic properties; Weibull is a special case of the 
generalized extreme value distribution which approximates the 
maxima of a finite sequence of random variables; Gamma is the 
maximum entropy probability distribution which chooses the 
unknown distribution that exhibits the highest entropy; and 
lognormal fits a distribution whose logarithm follows a Gaussian. 
Representative parameters were chosen as they are the best 
parameters that represent the three statistical distributions; they 
are calculated the same for each distribution without biasing one 
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Figure 11. The insertion rate of VEGFR1 tunes the anti- 
angiogenic effect of anti-VEGF treatment. Sensitivity analysis of 
the insertion rate of VEGFR1 into the tEC membrane tunes the efficacy 
of anti-VEGF treatment. This example examines "Density 2" at 3 weeks 
of tumor growth giving 69,500 VEGFRI/tEC. The insertion rate using 
69,500 VEGFRI/tEC is approximately 1e-14 s"\ where anti-VEGF 
treatment provides a pro-angiogenic response. Decreasing the insertion 
rate allows for anti-VEGF treatment to provide an anti-angiogenic 
response. 

doi:1 0.1 371 /journal.pone.0097271 .gO1 1 

distribution over the other. These representation parameters are 
thus generalized for any heterogeneous data set and require no a 
priori knowledge. Expanding low bin search to include more 
statistical distributions will allow for more accurate heterogeneous 
data representations. 

VEGFR1 levels control free VEGF levels 

Free VEGF levels are more sensitive to VEGFRl than 
VEGFR2: using the C57BL/6 geometric mean from healthy 
endothelial cells, we observed that updated VEGFRl alone 
resulted in a steady state free VEGF concentration change of 
24.6% compared to control in the diseased tissue, while VEGFR2 
alone gave a 4.8% change. As VEGFRl and VEGFR2 on the 
healthy endothelial cells are altered, we would expect free VEGF 
levels in the healthy tissue and blood compartments to change as 



those compartments are where the healthy endothelial cell 
receptors are located. Previous studies confirm that free VEGF 
concentrations are more sensitive to VEGFRl than VEGFR2 
levels, due to a higher binding affinity to VEGFRl [10]. As such, 
VEGFRl is typically identified as a decoy receptor [41]. However, 
more insight is necessary into the complete function of VEGFR 1 . 

VEGFRl is important for cell migration 

We have shown a high VEGF binding affinity and sensitivity to 
VEGFRl, indicating that VEGFRl plays a more important role 
than simply being a decoy. For HUVECS, VEGFRl mediates p38 
phosphorylation, which controls cell migration through actin 
reorganization, whereas VEGFR2 mediates ERKl/2 phosphor- 
ylation, which mediates cell proliferation [42,43]. VEGFRl is 
expressed at high levels on BALB/c fibroblasts (36,000 rec/ceU) 
[1 1], and fibroblast migration can be abolished with bevacizumab 
[44]. This VEGFRl migratory function is also observed in 
macrophages [45,46], monocytes [47-49], and endothelial cells 
[50]. If the VEGFRl migratory function observed in these studies 
also translates to the tumor endothelial cells, this would explain the 
high VEGF binding to VEGFRl observed in our computational 
model. 

Several studies have also suggested a role for upregulated 
VEGFRl in migration. In murine hindlimb ischemia, VEGFRl, 
as well as proliferation markers and migration, are upregulated on 
ischemic endothelial cells ten days post ischemic induction; this 
time point coincides with accelerated perfusion recovery to the 
hindlimb [12]. Another study has shown that VEGFRl is 
expressed 3 days after embyroid body induction, a time marked 
by significant migratory phenotype by endothelial PmTcl cell 
lines. VEGFRl expression is sustained through day ten, at which 
point the endotheUal-like cells arrange in a distinct pattern to line 
vessel-like structures [51]. Gastric ulcers were induced in RAG2 
mice, and gastric ulcer healing in VEGFRl knockout mice 
resulted in decreased CD31 mRNA levels and decreased 
microvessel density compared to wUdtype mice, indicating that 
VEGFRl has an important role in cell migration and vessel 
formation in gastric ulcer healing [52]. Additionally, VEGFRl 
expression on highly metastatic 3LL-LLC tumor cells induces 
MMP9 expression in premetastatic lung endothelial cells and 
macrophages. MMP9 then breaks down the extracellular matrix 
and allows tumor cell migration [53]. It should be noted that 
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Figure 12. Effect of tEC receptor levels on anti-VEGF treatment at 6 weeks of tumor growth. Free VEGF in the diseased tissue 
compartment in response to anti-VEGF treatment after updating (A) VEGFRl alone, (B) VEGFR2 alone, and (C) both receptors on the tECs at 6 weeks 
of tumor growth. "Density 1" (D1) corresponds to the Gaussian with highest weight in the mixture model, "Density 2" (D2) is the second highest 
weight and "Density 3" (D3) is the lowest weight. "IVlixture" was obtained by summing the geometric means of each density distribution weighted by 
their density in the mixture model. The geometric mean was used for all receptor distributions (1,500 VEGFRI/Mixture, 1,100 VEGFR2/Mixture, 600 
VEGFRl /D1, 600 VEGFR2/D1, 1,700 VEGFRl /D2, 2,400 VEGFR2/D2, 12,250 VEGFRl /D3, 600 VEGFR2/D3). The control reflects previously published 
VEGFRl and VEGFR2 levels (1,100 VEGFRI/tEC and 700 VEGFR2/tEC) [23]. 
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VEGFRl does not appear to modulate neuronal migration - lowly 
expressed VEGFRl on cortical neuron cells was blocked with a 
polyclonal antibody and neurogenesis was not hindered [54]. 

Likewise, neurons expressed no VEGFRl mRNA when the cells 
were treated with VEGF-Aigs, indicating VEGFRl is unimpor- 
tant for neurogenesis [55]. Altogether, our results, and these 
previous studies, support the idea that VEGFRl is upregulated at 
early angiogenesis stages and is a crucial regulator of cell 
migration. 

The healthy body model assures physiological relevancy 

For systems biology studies, building and testing model modules 
assures that individual model pieces give physiologically relevant 
results before adding additional modules. The healthy body model 
was used, as it allows the heterogeneity effects to be observed in a 
simplified model. This allowed us to sequentially insert experi- 
mentally obtained receptor levels on healthy endothelial cells and 
observe the effect in the healthy tissue compartment they reside in, 
without any cross talk from the tumor. Initial model versions 
contained a single, normal tissue compartment, and provided 
insight into the importance in VEGFR levels in determining 
VEGF concentration [56] . This iterative process allows individual 
molecules or parameters to be understood, as well as provides a 
troubleshooting basis for more complex models. 

Tumor associated cells control drug efficacy 

VEGFRl /2 levels on tumor cells and tECs play an important 
role in determining drug efficacy. By observing free VEGF levels 
in the diseased tissues compartment, we have shown that anti- 
VEGF treatment is ineffective against specific tumor cell and tEC 
subpopulations that have high expression levels. These high 
expression levels consequently increased the membrane protein 
insertion rate, and we showed that decreasing the insertion rate 
recovered the anti-angiogenic result from anti-VEGF treatment. 
For these subpopulations, receptor mRNA could be targeted to 
reduce receptor levels or the insertion rate could be decreased by 
inhibiting insertion proteins [57]. tECs can also form various vessel 
types, creating a highly heterogeneous tumor vasculature [58] . It is 
possible the formed blood vessel t}pe varies with time, with early- 
vessels being more responsive to anti-VEGF treatment and late 
vessels portraying anti-VEGF resistance [59]. The hierarchy of 
large arteries bifurcating into successively smaller conduits present 
in normal blood vessel networks is missing in tumor vasculature, 
which often lack sufficient blood flow, pericyte coverage, and 
exhibit leakiness and dilation [60,61]. These changes in tumor 
vasculature could account for the heterogeneous subpopulations 
that we observe in tECs. One study aiming to characterize cell 
subpopulation heterogeneity decomposed populations based on 
basal signaling markers and showed cells with similar basal 
signahng to have similar drug sensitivities [62], while others have 
characterized subpopulations through signaling markers after 
applying drug treatment [63]. Singh el al noted that characterizing 
subpopulations in an ensemble manner may be required to 
distinguish biological or functional differences in subpopulations. 
These tEC subpopulations and heterogeneities in vessel formation 
and properties indicate a need to better characterize cell 
subpopulations and their potential responses to drug treatment. 
Our work provides insight into subpopulation dispersion with 
regards to VEGFR membrane localization and how this affects 
drug treatment. 



Incorporating cellular heterogeneity with signaling 

pathways 

We present here how receptor heterogeneity affects ligand 
binding kinetics and ligand distributions. The model implemented 
here is a simplified angiogenesis model as it does not account for 
receptor trafficking or cell signaling pathways. However, compu- 
tational studies require building models in an iterative manner to 
ensure physiological relevancy, and the lack of previous studies 
computationally exploring cellular heterogeneity necessitates this 
simplified model. For instance, early computational models sought 
to reveal the mechanism of EGFR activation via EGF binding 
using simplified models to examine the interaction kinetics [64,65] . 
Later models then sought to better understand the cellular 
complexity pertaining to EGFR activation by mapping the 
downstream signaling and receptor internalization [19,66,67], 
However, VEGF models have not yet reached the sophistication 
that EGF models have [68], highlighting the necessity to examine 
VEGF model sensitivity. Prior sensitivity analysis revealed that 
VEGFRs are the primary controller of free VEGF levels [56]. 
Early VEGF models first examined ligand binding kinetics to 
VEGFR [69] or receptor dimerization kinetics [70] in simplified 
models. Later studies increased complexity by modeling tumor 
angiogenesis to examine the effect of the tumor microenvironment 
on anti-VEGF treatment efficacy [71,72], and included targeting 
specific VEGF isoforms [73]. Recent models incorporated 
receptor trafficking and specific signaling pathways to better 
understand cellular activity upon ligand binding [41,68] using 
ensemble a\eraged receptor levels. Therefore, accurate VEGFR 
representations are necessary for accurate model development. 
Our ability to obtain these experimental data on a ceU-by-ceU level 
allowed us to now examine the most appropriate way to analyze 
the data, represent the data, and extract representative parame- 
ter(s) for use in models. Here, we show that these are critical steps 
for accurate model development. We similarly show how data 
representation affects the ligand binding kinetics and ligand 
distribution in a simplified model. We use free VEGF levels as the 
functional model output, since it is the primary signaling molecule 
in angiogenesis [74], its secretion by tumor mediates tumor growth 
and metastasis [75], and a meta-analysis has shown that VEGF 
levels in blood and serum is significantly elevated in cancer 
patients [76]. We believe that our work presents a novel 
foundation for understanding heterogeneity, particularly in tumor, 
and with such a foundation, we can now incorporate downstream 
signaling for more reliable predictive power. 

Concluding remarks 

Here we show how the choice of data representation in 
heterogeneous populations can affect anti-angiogenic drug efficacy 
predictions. While we select a single value from the populations to 
represent the heterogeneity (every cell in the population contains 
that representative parameter), we present significant analysis to 
support the parameter representation; thereby establishing the 
need to fully analyze experimental data and comprehensively 
identify the effect of data representation on model predictions. 
With the advent of high-throughput cell-by-cell data, compart- 
ments should incorporate these cell-by-cell data as individual cells 
with simulations probing the aggregation of these cell-by-cell 
dynamics. Such approaches, apphed to our system, would require 
expanding the single normal tissue compartment to A + 1 
compartments for h healthy endothelial cells, expanding the 
diseased tissue compartment to t-\-d+ \ compartments for d tECs 
and t tumor cells, and expanding the blood compartment to 
h + t+ \ compartments. Even for relatively low numbers of cells, 
say 100,000 of each type, we estimate a single simulation could 
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take upwards of 170 hours/? days on the system used for this 
study. This increased complexity would require supercomputing, 
split among multiple processors to obtain reasonable solution 
times. Extensions such as these would advance new insight into 
how a system of distinct cells interacts to achieve a specific 
response. 

Supporting Information 

Figure SI Statistical distribution fits to ex vivo receptor 
populations. CeU-by-cell analysis of VEGFRl/2 distributions 
from C57BL/6 (A-B) and BALB/c (C-D) mice. Each distribution 

was fit to WeibuU (generalized extreme value distribution), 
Gamma (maximum entropy probability distribution), and lognor- 
mal (logarithm is normally distributed) probability density 
functions. The parameters for the best fit distributions are given 
in Table 2. 
(TIF) 

Figure S2 Receptor level effect on £ree VEGF levels in 
the healthy body model. Comparison of adding ex vivo or in 
vitro rc-ceptor levels in the healthy body model. VEGF 
concentration was initialized at 0 pM and simulated to steady 
state. The geometric mean (2,100 VEGFR1/C57, 1,540 
VEGFR2/C57, 2,700 VEGFRl/BAL, 1,900 VEGFR2/BAL, 
2,530 VEGFRl/HUVEC, 5,260 VEGFR2/HUVEC) and mode 
(1,820 VEGFR1/C57, 2,860 VEGFR2/C57, 1,700 VEGFRl/ 
BAL, 1,200 VEGFR2/BAL, 2,720 VEGFRl/HUVEC, 11,400 
VEGFR2/HUVEC) of the distributions were used. (A,D) Free 
VEGF response in updating VEGFRl alone, (B,E) updating 
VEGFR2 alone, and (C,F) updating both simultaneously. The 
control reflects previously published VEGFRl and VEGFR2 
levels (1,100 VEGFRl/cell and 700 VEGFR2/cell) [23]. 
(TIF) 

Figure S3 Receptor level effect on fi-ee VEGF levels in 
the tumor model. Comparison of adding ex vivo or in vitro 
receptor levels in the tumor model. An optimized anti-VEGF 
agent is added at t = 0 and the VEGF concentration response is 
simulated to 3 weeks after injection. The geometric mean (2,100 
VEGFR1/C57, 1,540 VEGFR2/C57, 2,700 VEGFRl/BAL, 
1,900 VEGFR2/BAL, 2,530 VEGFRl/HUVEC, 5,260 
VEGFR2/HUVEC) and mode (1,820 VEGFR1/C57, 2,860 
VEGFR2/C57, 1,700 VEGFRl/BAL, 1,200 VEGFR2/BAL, 
2,720 VEGFRl/HUVEC, 11,400 VEGFR2/HUVEC) of the 
distributions were used. (A-C) Free VEGF response in updating 
VEGFRl alone, (D-F) updating VEGFR2 alone, and (G-I) 
updating both simultaneously. The control reflects previously 
published VEGFRl and VEGFR2 levels (1,100 VEGFRl/ceU 
and 700 VEGFR2/ceU) [23]. 
(TIF) 
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Table SI Percent of each data set defined as outliers. 
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in percent defined as oudiers between low bin search and 3 STD is 
7.59%. 
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